-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Coqui TTS #59
base: main
Are you sure you want to change the base?
Conversation
@p0n1 do you have time to give your thoughts on this PR? |
No probs! Hope for the best 💪🏼 |
Have you tried building the Docker image from the docker file using this? I checked out your repository but apparently its missing gcc and the rust compiler. I think another image is needed to install TTS in the docker image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you would add a link to some guide on how to set Coqui in the README file so I could set it up and test it before merging.
|
||
|
||
def get_tts_provider(config) -> BaseTTSProvider: | ||
if config.tts == TTS_AZURE: | ||
from audiobook_generator.tts_providers.azure_tts_provider import AzureTTSProvider | ||
from audiobook_generator.tts_providers.azure_tts_provider import \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no functional change, just cosmetics, not needed
return AzureTTSProvider(config) | ||
elif config.tts == TTS_OPENAI: | ||
from audiobook_generator.tts_providers.openai_tts_provider import OpenAITTSProvider | ||
from audiobook_generator.tts_providers.openai_tts_provider import \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no functional change, just cosmetics, not needed
return OpenAITTSProvider(config) | ||
elif config.tts == TTS_EDGE: | ||
from audiobook_generator.tts_providers.edge_tts_provider import EdgeTTSProvider | ||
from audiobook_generator.tts_providers.edge_tts_provider import \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no functional change, just cosmetics, not needed
@@ -94,23 +94,23 @@ def handle_args(): | |||
help=''' | |||
Speaking rate of the text. Valid relative values range from -50%%(--xxx='-50%%') to +100%%. | |||
For negative value use format --arg=value, | |||
''' | |||
''', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is last argument in function, comma not required
) | ||
|
||
edge_tts_group.add_argument( | ||
"--voice_volume", | ||
help=''' | ||
Volume level of the speaking voice. Valid relative values floor to -100%%. | ||
For negative value use format --arg=value, | ||
''' | ||
''', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is last argument in function, comma not required
) | ||
|
||
edge_tts_group.add_argument( | ||
"--voice_pitch", | ||
help=''' | ||
Baseline pitch for the text.Valid relative values like -80Hz,+50Hz, pitch changes should be within 0.5 to 1.5 times the original audio. | ||
For negative value use format --arg=value, | ||
''' | ||
''', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is last argument in function, comma not required
As the title says this PR adds a new provider supporting the Coqui TTS.
The default model, Tacotron2, works very similar to EdgeTTS although it only has a single voice option for now. The power of this provider is the possibility of supporting multiple open TTS models with some very powerful like jenny.
Another interesting feature is voice dubbing with the likes of XTTS V2. There's a bug on sentences longer than 400 tokens for now though. To support voice dubbing I've added a folder with 3 voice samples and defaulted to the male one. Additionally, in this mode multiple languages are supported. As the options are different than the ones on
--language
I've added a new option named--coqui_language
.For this version the provider supports the same audio formats as edgeTTS thanks to pydub.
Note: To run coqui TTS it will always download the AI model to run. This can go from a few MB to more than 1 GB